Annals of Internal Medicine
● American College of Physicians
Preprints posted in the last 7 days, ranked by how well they match Annals of Internal Medicine's content profile, based on 27 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Neely, M.; Wojdyla, D. M.; Hong, H.; Wang, P.; Anderson, M. R.; Arroyo, K.; Belperio, J.; Benvenuto, L.; Budev, M.; Combs, M.; Dhillon, G.; Hsu, J. Y.; Kalman, L.; Martinu, T.; McDyer, J.; Oyster, M.; Pandya, K.; Reynolds, J. M.; Rim, J. G.; Roe, D. W.; Shah, P. D.; Singer, J. P.; Singer, L.; Snyder, L. P.; Tsuang, W.; Weigt, S. S.; Christie, J. D.; Palmer, S. M.; Todd, J.
Show abstract
Background: We aimed to identify data-driven FEV1 trajectory phenotypes post-chronic lung allograft dysfunction (CLAD), relate these phenotypes to patient factors and future graft loss, and develop a classification approach for prospective patients. Methods: We studied adult first lung recipients with probable CLAD from two prospective multicenter cohorts: CTOT-20 (n=206) and LTOG (n=1418). FEV1 trajectories over the first nine months post-CLAD were characterized using joint latent class mixed models, jointly modelling time-to-graft loss to account for informative censoring. Models were fit independently in both cohorts and also only among LTOG bilateral recipients. A classification and regression tree (CART) model was derived in LTOG bilateral recipients and applied to CTOT-20 bilateral recipients. Findings: Four distinct early FEV1 trajectory classes were identified in CTOT-20, with large differences in nine month graft loss (72.3%, 31.1%, 2.2%, 0%). In LTOG, similar trajectory patterns were reproduced, with an additional class demonstrating early post-CLAD FEV1 improvement. Among bilateral recipients, trajectory classes showed a clear risk gradient, including a high-risk class with 100% graft loss and a low-risk class with no early graft loss. A CART model incorporating clinical and spirometric variables demonstrated good discrimination in LTOG bilateral recipients (multiclass AUC 0.85) and consistent class assignment and trajectory patterns when applied to CTOT-20. Interpretation: We identified reproducible, clinically meaningful early post-CLAD FEV1 trajectory phenotypes with differential graft loss risk. These phenotypes and a pragmatic classification tool may support risk stratification, trial enrichment, and improved prognostication for patients and clinicians.
Lin, T.; Li, Y.; Huang, Z.; Gui, T. T.; Wang, W.; Guo, Y.
Show abstract
Target trial emulation (TTE) offers a principled way to estimate treatment effects using real-world observational data, but analyses of time-varying treatment strategies remain vulnerable to immortal time bias. The clone-censor-weight (CCW) approach is increasingly used to address this problem, yet key aspects of its causal interpretation and implementation remain unclear. In this work, we emulate a target trial using electronic health records (EHRs) to compare completion of a 3-dose 9-valent human papillomavirus vaccination (HPV) series within 12 months versus remaining partially vaccinated among vaccine initiators. We link CCW to the classic potential outcome framework in causal inference, evaluate the role of different weighting mechanisms, and account for within-subject correlation induced by cloning using cluster-robust variance estimation. Our study provides practical guidance for applying CCW in real-world comparative effectiveness studies to address immortal time bias and supports more rigorous and interpretable treatment effect estimation in TTE.
DeCuir, J.; Reeves, E. L.; Weber, Z. A.; Yang, D.-H.; Irving, S. A.; Tartof, S. Y.; Klein, N. P.; Grannis, S. J.; Ong, T. C.; Ball, S. W.; DeSilva, M. B.; Dascomb, K.; Naleway, A. L.; Koppolu, P.; Salas, S. B.; Sy, L. S.; Lewin, B.; Contreras, R.; Zerbo, O.; Hansen, J. R.; Block, L.; Jacobson, K. B.; Dixon, B. E.; Rogerson, C.; Duszynski, T.; Fadel, W. F.; Barron, M. A.; Mayer, D.; Chavez, C.; Yates, A.; Kirshner, L.; McEvoy, C. E.; Akinsete, O. O.; Essien, I. J.; Sheffield, T.; Bride, D.; Arndorfer, J.; Van Otterloo, J.; Natarajan, K.; Ray, C. S.; Payne, A. B.; Adams, K.; Flannery, B.; Garg,
Show abstract
Background: The 2024-25 influenza season was the most severe in the United States (US) since 2017-18, with co-circulation of both influenza A virus subtypes (H1N1 and H3N2). Influenza vaccine effectiveness (VE) has varied by season, setting, and patient characteristics. Methods: Using electronic healthcare encounter data from eight US states, we evaluated influenza vaccine effectiveness (VE) against influenza-associated hospitalizations and emergency department or urgent care (ED/UC) encounters from October 2024-April 2025 among children aged 6 months-17 years and adults aged 18+ years. Using a test-negative, case-control design, we compared the odds of influenza vaccination between acute respiratory illness (ARI) encounters with a positive (cases) versus negative (controls) test for influenza by molecular assay, adjusting for confounders. Results: Analyses included 108,618 encounters (5,764 hospitalizations and 102,854 ED/UC encounters) among children and 309,483 encounters (76,072 hospitalizations and 233,411 ED/UC encounters) among adults. Among children across care settings, 17.0% (6,097/35,765) of cases versus 29.4% (21,449/72,853) of controls were vaccinated. Among adults, 28.2% (21,832/77,477) of cases versus 44.2% (102,560/232,006) of controls were vaccinated. VE was 51% (95% confidence interval [95% CI]: 41-60%) against influenza-associated hospitalizations and 54% (95% CI: 52-55%) against influenza-associated ED/UC encounters among children. VE was 43% (95% CI: 41-46%) against influenza-associated hospitalizations and 49% (95% CI: 47-50%) against influenza-associated ED/UC encounters among adults. Conclusions: Influenza vaccination provided protection against influenza-associated hospitalizations and ED/UC encounters among children and adults in the US during the severe 2024-25 influenza season. These findings support influenza vaccination as an important tool to reduce influenza-associated disease.
Desgraupes, S.; Boireau, S.; Khalil, M.; Aouinti, S.; Nisole, S.; Bollore, K.; Barbaria, W.; Barzaghi, F.; Dilena, R.; Boon, M.; Lunsing, R. J.; Tuaillon, E.; Westerholm-Ormio, M.; Deiva, K.; Bakker, D. P.; Kuijpers, T. W.; Yeh, E. A.; Lim, M.; Picot, M. C.; Meyer, P.; Arhel, N. J.
Show abstract
Background: Acute necrotizing encephalopathy (ANE) is a rare and severe neurologic complication of viral infection in children, thought to result from a hyperacute cytokine storm causing blood-brain barrier disruption and central nervous system injury. Despite characteristic clinical and radiologic features, ANE remains poorly understood at the molecular level, with no validated biomarkers or targeted therapies. We aimed to determine whether genetic predisposition to ANE due to RANBP2 variants is associated with a distinct immunologic signature. Methods: We conducted a prospective biological study of familial ANE (ANE1, NCT06731790). We included 23 heterozygous carriers of the RANBP2 c.1754C>T (p.Thr585Met) variant from 10 families, and 28 noncarriers (median age, 40 years [range, 4-72]). Soluble immune mediators, transcriptomic analyses, multiparameter flow cytometry, and cellular imaging were analysed in peripheral blood mononuclear cells (PBMCs) and monocytes. Baseline and resiquimod stimulated immune responses were analysed within the same statistical model, with genetic status as the primary predictor. Findings: The RANBP2 Thr585Met mutation was associated with a dysregulated inflammatory phenotype characterized by reduced basal mediator production and exaggerated TNF- responses following stimulation (estimated difference, +2,098 pg/mL; 95% CI, 1,121 to 3,076; P=0.0001). Transcriptomic and flow cytometry analyses showed broad reprogramming of myeloid cells with enrichment of CXCR3-high CD14-high subsets. Expansion of these populations was associated with increased long-term disease burden. The RANBP2 variant was the only independent factor associated this inflammatory phenotype. Interpretation: RANBP2-associated ANE is characterised by a distinct immunological signature that can inform disease stratification and support the development of targeted immunotherapeutic approaches.
Yamga, E.; Murphy, S.; Despres, P.
Show abstract
Background Electronic health record (EHR) phenotyping underpins observational research, cohort discovery, and clinical trial screening. Large language models (LLMs) offer new capabilities for extracting phenotypes from unstructured text, but their performance depends on pipeline design choices-including prompting, text segmentation, and aggregation. No systematic framework has previously examined how these parameters shape accuracy and reproducibility. Methods We evaluated LLM-based phenotyping pipelines using 1,388 discharge summaries across 16 clinical phenotypes. A full factorial experiment with LLaMA-3B, 8B, and 70B systematically varied three pipeline components: prompting (zero-shot, few-shot, chain-of-thought, extract-then-phenotype), chunking (none, naive, document-based), and aggregation (any-positive, two-vote, majority), yielding 24 configurations per model. To compare intrinsic model capabilities, biomedical domain-adapted, commercial frontier (LLaMA-405B, GPT-4o, Gemini Flash 2.0), and reasoning-optimized models (DeepSeek-R1) were evaluated under a fixed configuration. Performance was assessed using precision, recall, and macro-F1; secondary analyses examined prediction consistency (Shannon entropy), self-confidence calibration, and the development of a taxonomy of recurrent model errors. Results Factorial ANOVAs showed that chunking and aggregation were the dominant drivers of performance, whereas the prompting strategy contributed minimally. Configuration effects were stable across model sizes, with no significant Model x Parameter interactions. Phenotype difficulty varied substantially (macro-F1 = 0.40-0.90), yet the highest-performing configuration-whole-document inference without aggregation-was consistent across phenotypes, as confirmed by mixed-effects modeling. In cross-model comparisons, DeepSeek-R1 achieved the highest macro-F1 (0.89), while LLaMA-70B matched GPT-4o and LLaMA-405B at substantially lower cost. Prediction entropy was low overall and driven primarily by phenotype difficulty rather than prompting or temperature. Self-confidence calibration was only moderately informative: high-confidence predictions were more accurate, but larger models exhibited systematic overconfidence. Conclusions LLM performance in EHR phenotyping is governed primarily by input structure and model capacity, not prompt engineering. Simple, document-level inference yields robust performance across diverse phenotypes, providing practical design guidance for LLM-based cohort identification while underscoring the continued need for human oversight for challenging phenotypes.
Chen, B.; Zambrana, J. V.; Shotwell, A.; Sanchez, N.; Plazaola, M.; Ojeda, S.; Lopez, R.; Stadlbauer, D.; Kuan, G.; Balmaseda, A.; Krammer, F.; Gordon, A.
Show abstract
Background: Although the hemagglutination inhibition (HAI) titer remains the gold standard correlate of protection against influenza, it does not fully capture the broader antibody responses that contribute to immunity. Methods: We analyzed immune responses in paired pre-infection and convalescent sera from 306 RT-PCR-confirmed A/H3N2 infections from two household studies (2014-18) in Managua, Nicaragua. Antibody responses were measured by HAI and enzyme-linked immunosorbent assays (ELISAs) against full-length hemagglutinin (HA), the HA stalk, and neuraminidase (NA). Participants were classified as HAI responders ([≥]4-fold HAI rise), alternate responders (no HAI rise but [≥]4-fold boost in [≥]1 ELISA), or no-response individuals (no [≥]4-fold rise in any assay). We compared demographic, clinical, and pre-infection antibody characteristics across these groups. We also analyzed predictors of an NA response. Results: Overall, 77% of participants had HAI seroconversion or a 4-fold rise. Among the 23% HAI non-responders, 62% had alternate antibody responses. No-response individuals had the highest pre-infection HAI and full-length HA titers (p < 0.0001), the lowest viral loads, and the fewest fever or influenza like illness (ILI) symptoms (p < 0.01). An NA response was more common among symptomatic individuals (p = 0.0483) and those with low or high baseline NA titers. Conclusions: High baseline HAI titers can limit detectable 4-fold rises and are associated with milder illness. Evaluating additional immune responses may capture a more complete picture of the host response to infection, thereby improving surveillance and informing vaccine development. Keywords: Influenza A/H3N2; Hemagglutination inhibition (HAI); Neuraminidase antibodies; symptomatic vs asymptomatic infection; correlates of protection.
Garcia Quesada, M.; Wallrafen-Sam, K.; Kiti, M. C.; Ahmed, F.; Aguolu, O. G.; Ahmed, N.; Omer, S. B.; Lopman, B. A.; Jenness, S. M.
Show abstract
Non-pharmaceutical interventions (NPIs) have been important for controlling SARS-CoV-2 transmission, particularly before and during initial vaccine rollout. During the pandemic, the US Centers for Disease Control and Prevention issued isolation and masking guidance in case of COVID-19-like illness, a positive SARS-CoV-2 test, or known exposure to SARS-CoV-2. However, the impact of this guidance on mitigating transmission in office workplaces is unclear. We used a network-based mathematical model to estimate the impact of this guidance on SARS-CoV-2 transmission among office workers and their communities. The model represented social contacts in the home, office, and community. We used data from the CorporateMix study to parametrize social contacts among office workers and calibrated the model to represent the COVID-19 epidemic in Georgia, USA from January 2021 through August 2022. In the reference scenario (58% adherence to guidance among office workers and the broader population), workplace transmission accounted for a small fraction of total infections. Reducing adherence among office workers to 0% increased workplace transmissions by 27.1% and increasing adherence to 75% reduced workplace transmission by 7.0%. Increasing adherence to 75% among office workers had minimal impact on symptomatic cases and deaths; increasing it among the broader population was more effective in reducing office worker cases and deaths. In our model, moderate adherence to recommended NPIs in workplaces was effective in reducing transmission, but increasing adherence had limited benefit given workplaces that have low contact intensity and hybrid work arrangements. These results underscore the public health benefits of community-wide adoption of recommended NPIs.
James-Pemberton, P.; Harper, D.; Wagerfield, P.; Watson, C.; Hervada, L.; Kohli, S.; Alder, S.; Shaw, A.
Show abstract
A multiplex diagnostic test is evaluated for self-reported long COVID associated persistent symptoms and a poor recovery from a SARS-CoV-2 infection. A mass-standardised concentration of total antibodies (AC), high-quality (HQ) antibodies and percentage of HQ antibodies (HQ%) is assessed against a spectrum of spike proteins to the SARS-CoV-2 variants: Wuhan, , {delta}, and the Omicron variants BA.1, BA.2, BA.2.12.1, BA.2.75, BA.5, CH.1.1, BQ.1.1 and XBB.1.5 in three cohorts. A cohort of control patients (n = 46) recovered (CC) and a cohort of self-declared long COVID patients (n = 113) (LCC). A nested Receiver Operating Characteristic (ROC) analysis, performed for the variant with lowest HQ concentration in the spectrum, produced an area under the curve and AUC = 0.61 (0.53-0.70) for the CC vs LCC cohorts. For the LCC cohort, the cut-off thresholds for AC = 0.8 mg/L, HQ = 1.5 mg/L and HQ% of 34% were determined, leading to a 71% sensitivity and 66% specificity derived by the Youden metric. The cohorts may be fully classified based on ROC and outlier analysis to give an incidence of persistent virus 62% (95% CI 52% - 71%), hyperimmune 12% (95% CI 7% - 20%) and unclassified, 26% (95% CI 18% - 35%). The overall diagnostic accuracy for both the hyper and hypo immune is 69%. All clinical interventions can now be tailored for the heterogenous long COVID patient cohort.
Pasin, C.; Jackson, S. S.; Thynne, L.-E.; McWade, B.; Westerman, T.; Ball, R.; Kavanagh, J.; O'Callaghan, S.; Ring, K.; Orkin, C.; Berner, A. M.
Show abstract
ObjectivesTo estimate current, and 5- and 10-year projected, number of cases of cancer per year in transgender and gender diverse (TGD) people in England, overall and by tumour type, accounting for uptake of gender affirming care (GAC). DesignPopulation-based epidemiological modelling study using an age-stratified Monte Carlo simulations approach and the NORDPRED method for predictions. SettingModels estimating cancer case numbers for TGD people in England based on publicly available 2023 cancer surveillance data and survey-based 2025 GAC access, and predicted at 5 and 10 years hence. ParticipantsTGD people aged 15 years and above. Main outcome measuresPrimary cancer cases per year overall, by gender, age group, tumour type, and current and planned GAC. ResultsThe estimated TGD population size in England is 441547 (95% uncertainty interval (UI) 429207- 452890). Total cases per year of cancer in TGD people is expected to be 966 (95% UI 882-1069) excluding non-melanoma skin. Most cases are expected to occur in people aged 60-64. The top 5 expected cancers in TGD people are breast (19%, n = 187, 95% UI 149-241), colorectal (12%, n = 117, 95% UI 106-129), lung (11%, n = 108, 95% UI 96-122), melanoma (7.1%, n = 69, 95% UI 64-74) and urinary (6.2%, n = 60, 95% UI 54-67). Total cases of cancer in TGD people are estimated to be 1740 (95% UI 1584-1934) in 5 years and 2258 (95% UI 2066-2507) in 10 years (excluding non-melanoma skin). If TGD people were able to access their planned level of GAC, this would reduce these figures to 1555 (95% CI 1386-1766) and 2012 (95% CI 1797-2282) respectively. ConclusionsThis study provides prediction of cancer cases in TGD people in England, supporting the planning of service provision and training. This is vital, as with increasing disclosure, and long wait times for GAC, cancer cases in TGD people are predicted to increase. Summary BoxesO_ST_ABSWhat is already known on this topicC_ST_ABSThe annual number of cases of cancer in transgender and gender diverse (TGD) people in England is currently unknown as gender incongruence is not collected as part of the National Cancer Registration and Analysis Service. Some gender-affirming care (GAC) interventions are known to modulate cancer risk. Use of testosterone and chest reconstruction for transmasculine people is known to reduce their incidence of breast cancer compared to cisgender women. Use of oestradiol alongside medical or surgical androgen suppression has been shown to reduce the incidence of prostate cancer in transfeminine people while increasing their risk of breast cancer, compared to cisgender men. What this study addsThis study found that there are likely to be approximately 966 cases of cancer (excluding non-melanoma skin) in TGD people per year in the UK. Though total annual cases of cancer in TGD people are expected to be 2258 in 10 years, improved access to gender-affirming care could reduce total cases to 2012 (a 11% reduction). These figures provide additional justification for funding to improve access to GAC via the National Health Service (NHS), as well as for training on the oncological needs of this population.
Sun, S.; Cai, C. X.; Fan, R.; You, S.; Tran, D.; Rao, P. K.; Suchard, M. A.; Wang, Y.; Lee, C. S.; Lee, A. Y.; Zhang, L.
Show abstract
Multimodal learning has the potential to improve clinical prediction by integrating complementary data sources, but the incremental value of imaging beyond structured electronic health record (EHR) data remains unclear in real-world settings. We developed a multimodal survival modeling framework integrating optical coherence tomography (OCT) and EHR data to predict time to visual improvement in patients with diabetic macular edema (DME), and evaluated how different ophthalmic foundation model representations contribute to prognostic performance. In a retrospective cohort of 973 patients (1,450 eyes) receiving anti-vascular endothelial growth factor therapy, we compared multimodal models combining 22,227 EHR variables with 196,402 OCT images, with OCT embeddings derived from three ophthalmic foundation models (RETFound, EyeCLIP, and VisionFM). The EHR-only model showed minimal prognostic discrimination (C-index 0.50 [95% CI, 0.45-0.55]). Incorporating OCT improved performance, with the magnitude of improvement depending on the representation. EHR+RETFound achieved the strongest performance (C-index 0.59 [0.54-0.65]), followed by EHR+EyeCLIP (0.57 [0.52-0.62]) and EHR+VisionFM (0.56 [0.51-0.61]). Multimodal models, particularly EHR+RETFound, demonstrated improved risk stratification with clearer separation of Kaplan-Meier curves. Partial information decomposition revealed that prognostic information was dominated by modality-specific contributions, with OCT and EHR providing largely distinct signals and minimal shared information. The magnitude of OCT-specific contribution varied across foundation models and aligned with observed performance differences. These findings indicate that OCT provides complementary prognostic value beyond structured clinical data, but gains are modest and depend strongly on representation choice. Our results highlight both the promise of multimodal modeling for personalized prognosis and the need for rigorous, context-specific evaluation of foundation models in real-world clinical settings.
Claus, L.; McNamara, M.; Oser, C.; Fogle, C.; Canine, B.
Show abstract
Cardiovascular disease (CVD) remains the leading cause of mortality in the United States, despite being largely preventable through effective management of risk factors. This study evaluates the impact of Phase II cardiac rehabilitation (CR) on functional capacity and quality of life, using data from the Montana Outcomes Project Cardiac Rehabilitation Registry. Functional capacity improvements were assessed via the six-minute walk test (6MWT) and Dartmouth COOP questionnaire, with statistical analyses exploring the influence of CR session attendance, demographic factors, and referring diagnoses. Results demonstrated significant gains in 6MWT, with a mean improvement of 330.73 feet (p < .0001), and quality of life scores across all subgroups. A dose-response relationship was observed, indicating greater improvements with increased CR sessions (p < .0001), though diminishing returns were observed beyond 24-35 visits. Demographic factors and complex conditions influenced outcomes, underscoring the need for tailored strategies to enhance CR access and effectiveness. These findings highlight the critical role of CR in improving patient outcomes and emphasize the importance of addressing barriers to participation in underserved populations.
Kim, S.; Guo, Y.; Sutari, S.; Chow, E.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.
Show abstract
Social determinants of health (SDoH) are important for clinical care, but it remains unclear how much AI-captured social context is preserved after clinician editing in ambient documentation workflows. We retrospectively analyzed 75,133 paired ambient AI-drafted and clinician-finalized note sections from ambulatory care at a large academic health system. Using a rule-based NLP pipeline, we extracted 21 SDoH categories and quantified retention, deletion, and addition. SDoH appeared in 25.2% of AI drafts versus 17.2% of final notes. At the mention level, AI captured 29,991 SDoH mentions, of which 45.1% were deleted, 54.9% were retained with clinicians adding 3,583 new mentions. Insurance and marital status were most often deleted, whereas substance use and physical activity were more often retained. Deletion patterns also varied by specialty, supporting the need for specialty-aware ambient AI systems.
Laskaris, Z.; Baron, S.; Markowitz, S. B.
Show abstract
ObjectivesRising temperatures are a major climate-related hazard for U.S. workers, increasing heat-related illness and a broad range of occupational injuries through indirect pathways often overlooked in economic evaluations. We examined the association between temperature and occupational injury and illness and quantified heat-attributable injuries (including illnesses) and costs in New York State. MethodsWe conducted a time-stratified case-crossover study of 591,257 workers compensation (WC) claims during the warm season (2016-2024). Daily maximum temperature was linked to injury date and county and modeled using natural cubic splines, with effect modification by industry and worker characteristics. ResultsInjury risk increased with temperature, becoming statistically significant at approximately 78{degrees}F. Relative to 65{degrees}F, injury odds increased to 1.06 (95% CI: 1.01-1.10) at 80{degrees}F, 1.12 (1.07-1.18) at 90{degrees}F, and 1.17 (1.11-1.23) at 95{degrees}F. Overall, 5.0% of claims (2,322 annually) were attributable to heat. At temperatures [≥]80{degrees}F, an estimated 1,729 excess injuries occurred annually, generating approximately $46 million in WC costs. An estimated $3.2 million to $36.1 million in medical expenditures were associated with incomplete claims, likely borne outside the WC system. ConclusionsThese findings demonstrate substantial economic costs not fully captured within WC and support workplace heat protections as a cost-containment strategy that can reduce health care spending and strengthen workforce resilience.
Martin, C. M.; henderson, i.; Campbell, D.; Stockman, K.
Show abstract
Background: The instability-plasticity framework proposes that multimorbidity trajectories periodically enter instability phases that are vulnerable to escalation but also potentially modifiable through relational intervention. Whether such phases commonly resolve without acute care, or predominantly progress to hospitalisation, has not been quantified at scale. Objective: To quantify instability window outcomes across a longitudinal monitoring cohort; to test whether the characteristics distinguishing admitted from resolved windows reflect within-patient trajectory dynamics or between-patient severity; and to characterise which patient-reported and operator-rated signals reliably precede admission, using both a curated pilot sub-cohort and the full monitoring cohort with an explicit cross-cohort comparison. Methods: Two complementary analyses were conducted on data from the MonashWatch Patient Journey Record (PaJR) relational telehealth system. Instability windows were identified algorithmically (>=2 consecutive calls with Total_Alerts >=3) across the full longitudinal dataset (16,383 calls, 244 patients, 2.5 years) and classified by linkage to ED and hospital admission data. Window characteristics were compared at window, patient, and paired within-patient levels. Pre-admission signal cascades were analysed in two configurations: a curated pilot sub-cohort (64 patients, 280 calls, +/-10-day window, 103 admissions, December 2016-September 2017) and the full monitoring cohort (175 patients, 1,180 pre-admission calls, +/-14-day window, December 2016-July 2019). A three-way cross-cohort comparison decomposed differences between the two configurations into pipeline and population effects. Results: 621 instability windows were identified across 157 patients (64% of the monitored cohort). 67.3% resolved without hospital admission or ED attendance, a rate stable across alert thresholds 1-5. In paired within-patient analysis (n = 70), duration in days (p = 0.002) and multi-domain breadth (p < 0.001) distinguished admitted from resolved windows; alert intensity did not. In the pilot sub-cohort, patient-reported illness prognosis (Q21) was the dominant pre-admission signal (GEE beta = +0.058, AUC = 0.647, p-BH = 0.018). This finding did not replicate in the full cohort: Q21 was non-significant (GEE beta = -0.008, p = 0.154, AUC = 0.507). Cross-cohort analysis identified selective curation of the pilot sub-cohort as the primary explanation. In the full cohort, six signals escalated significantly before admission after Benjamini-Hochberg correction: total alerts, health impairment (Q26), red alerts, self-rated health (Q3), patient concerns (Q1), and operator concern (Q34). Health impairment achieved the highest individual AUC (0.605) and showed the longest pre-admission lead. No individual signal exceeded AUC 0.61. Conclusions: Two thirds of instability phases resolve without hospitalisation, providing direct empirical support for trajectory plasticity as a clinically frequent phenomenon. Within the same patient, persistence - in duration and in the consistency of high-severity multi-domain flagging across calls - distinguishes trajectories that tip into admission from those that resolve. The Q21 signal reversal between cohorts illustrates how selective curation can produce compelling but non-replicable findings in monitoring research. In the full population, objective alert signals and operator judgement, rather than patient illness prognosis, carry the pre-admission signal
Oliveira Roster, K. I.; Rönn, M. M.; Gorenburg, E. R.; Partl, D. K.; Anderegg, N.; Abel zur Wiesch, P.; Au, C.; Kouyos, R. D.; Martinez, F. P.; Low, N.; Grad, Y. H.
Show abstract
Numerous factors may influence the optimal rollout of new gonococcal antibiotics. We compared eight rollout strategies using a gonorrhea transmission model and ranked strategies by the number of gonococcal infections and clinically useful antibiotic lifespan. Rankings were most sensitive to the starting ceftriaxone resistance prevalence and screening frequency.
Nkosi-Mjadu, B. E.
Show abstract
BackgroundSouth Africas public healthcare system serves most of the population through approximately 3,900 primary healthcare clinics characterised by long waiting times and high volumes of repeat-prescription visits. No published pre-arrival digital triage system operates across all 11 official South African languages while aligning with the South African Triage Scale (SATS). This paper reports the design and preliminary safety validation of BIZUSIZO, a hybrid deterministic-AI WhatsApp triage system. MethodsBIZUSIZO delivers SATS-aligned triage via WhatsApp, combining AI-assisted free-text classification (Claude Haiku 4.5) with a Deterministic Clinical Safety Layer (DCSL) that overrides AI output for 53 clinical discriminator categories (14 RED, 19 ORANGE, 20 YELLOW) coded in all 11 official languages and independent of AI availability. A five-domain risk factor assessment can only upgrade triage level. One hundred and twenty clinical vignettes in patient language (English, isiZulu, isiXhosa, Afrikaans; 30 per language) were scored against a developer-assigned gold standard with independent blinded nurse review. A 121-vignette multilingual DCSL safety consistency check across all 11 languages and a 220-call post-hoc framing sensitivity evaluation (110 paired vignettes) were also conducted. ResultsUnder-triage was 3.3% (4/120; 95% CI: 0.9%-8.3%) with no RED under-triage; exact concordance was 80.0% (96/120) and quadratic weighted kappa 0.891 (95% CI: 0.827-0.932). One two-level under-triage was observed on a non-RED presentation (V072, isiXhosa burns vignette, ORANGEGREEN); one two-level over-triage was observed (V054, isiZulu deep laceration, YELLOWRED). In the framing sensitivity evaluation, AI-only classification achieved 50.9% RED invariance under adversarial framing; full-pipeline classification achieved 95.0% in four validated languages, with the DCSL rescuing 18 of 23 AI drift cases. ConclusionsA hybrid deterministic-AI triage system with DCSL-based emergency detection achieved zero RED under-triage and consistent RED detection across all 11 official languages. The 16.7% over-triage rate falls within published South African SATS ranges (13.1-49%). A single two-level under-triage event was observed on an isiXhosa burns vignette (ORANGEGREEN) and is discussed in Limitations. Findings are preliminary; prospective validation against independent nurse triage is the necessary next step.
Bahig, S.; Oughton, M.; Vandesompele, J.; Brukner, I.
Show abstract
In dense urban settings, delays between diagnostic sampling and effective isolation can sustain transmission during peak infectiousness. We define a waiting-window transmission externality arising when infectious individuals remain mobile while awaiting results, formalized as E = N{middle dot}P{middle dot}TR{middle dot}D, where N is daily testing volume, P test positivity, TR transmission during the waiting period, and D turnaround time. Using Monte Carlo simulation and a susceptible-infectious-recovered (SIR) framework, we quantify excess infections per 1,000 tests/day under multiple diagnostic workflows. A surge scenario incorporates positive coupling between TR and D ({rho} = 0.45), reflecting co-occurrence of laboratory saturation and elevated contacts during system stress. Under centralized 48-hour workflows, excess infections reach [~]80 at P = 10% and [~]401 at P = 50%, increasing to [~]628 under surge conditions. In contrast, near-patient rapid testing and home sampling reduce this to [~]5 and [~]25-26, respectively. Workflows that eliminate the waiting window--either through immediate isolation at sampling or through home-based PCR that returns results at the point of collection--effectively collapse the transmission term. These findings identify diagnostic delay as a modifiable driver of epidemic dynamics. Operational redesign of testing workflows, including decentralized sampling and home-based molecular diagnostics, offers a scalable pathway to improve epidemic controllability and reduce inequities in dense urban environments.
Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.
Show abstract
Interpreting machine learning models typically relies on feature attribution methods that quantify the contribution of individual variables to model predictions. However, it remains unclear whether attribution magnitude reflects the true functional importance of features for model performance. Here, we present a unified interpretability framework integrating permutation-based attribution, feature ablation, and stability under perturbation across multiple feature spaces. Using nested cross-validation and permutation-based null diagnostics, we systematically evaluate the relationship between attribution magnitude and functional dependence in clinical and biomarker-based prediction models. Attribution magnitude is frequently misaligned with functional importance, with weak to strong negative correlations observed across feature spaces (Spearman {rho} ranging from -0.374 to -0.917). Features with high attribution often have limited impact on model performance when removed, whereas features with low attribution can be essential for maintaining predictive accuracy. These discrepancies define distinct classes of interpretability failure, including attribution excess and latent dependence. Interpretability further depends on feature space composition, and stable, functionally relevant features are not necessarily those with the highest attribution scores. By integrating attribution, functional impact, and stability into a composite Feature Reliability Score, we identify features that remain informative across perturbations and analytical contexts. These findings indicate that interpretability does not arise from attribution magnitude alone but is better characterized from stability under perturbation. This framework provides a basis for more robust model interpretation and highlights limitations of attribution-centric approaches in high-dimensional and correlated data settings.
Sawadogo, J. W.; Hema, A.; Diarra, A.; Kabore, J. M.; Hien, D.; Kouraogo, L.; Zou, A. R.; Ouedraogo, A. Z.; Tiono, A. B.; Datta, S.; Pasetti, M. F.; Neuzil, K. M.; Sirima, S. B.; Ouedraogo, A.; Laurens, M. B.
Show abstract
Typhoid fever remains a significant public health challenge in low- and middle-income countries. In 2018, The World Health Organization recommended a single dose typhoid conjugate vaccine (TCV) for routine immunization in endemic settings; however, evidence guiding booster doses remains limited. Homologous TCV booster doses have demonstrated immune boosting. This study assessed the immunogenicity and safety of a heterologous booster using a Vi capsular polysaccharide-CRM197 TCV (Vi-CRM) administered 5-6 years after primary vaccination with a Vi capsular polysaccharide tetanus toxoid TCV (Vi-TT) in children. Children previously enrolled in a Phase 2 trial were recruited. Participants who had received TCV at 9-11 or 15-23 months were given a Vi-CRM booster at 6-7 years of age (Booster-TCV group), and controls received their first TCV dose at the same age (1st-TCV group). Serum anti-Vi IgG concentrations were measured at baseline and 28 days post-vaccination. Solicited and unsolicited adverse events (AEs) and serious adverse events (SAEs) were recorded. Among 147 children enrolled, 87 received a second and 60 received a first TCV dose. Baseline anti-Vi IgG geometric mean titers (GMT) were higher in the Booster-TCV group (21.5 EU/mL; 95% CI: 17.2-26.8) than in the 1st-TCV group (5.5 EU/mL; 95% CI: 4.5-6.7). At day 28, GMTs rose markedly in both groups: 5140.0 EU/mL (95% CI: 4302.0-6141.3) in the Booster-TCV group and 2084.8 EU/mL (95% CI: 1724.4-2520.5) in the 1st-TCV group. Local reactions and systemic AEs were mild. No SAEs were observed. Vi-TT-induced immunity persisted for at least 5-6 years, and a heterologous booster triggered a strong immune response with universal seroconversion. These findings support heterologous prime-boost strategies to maintain protection in school-age children and inform optimization of TCV schedules in endemic regions.
Dornisch, A.; Rojo Domingo, M.; Alexander, R. V.; Conlin, C. C.; Do, S.; McKay, R. R.; Moiseenko, V.; Liss, M. A.; Liu, J.; Pawlicki, T.; Pena, S.; Qiao, E. M.; Rose, B. S.; Rupareliya, R.; Sandhu, A. P.; Scholey, J.; Seyedin, S. N.; Urbanic, J. J.; Wei, L.-J.; Seibert, T. M.
Show abstract
Definitive radiotherapy (RT) for prostate cancer (PC) with dose intensification and/or focal boosting has excellent oncologic outcomes, but many patients experience adverse events. Dose escalation to the whole prostate improves outcomes at the expense of increased late adverse events. Intraprostatic recurrence after definitive RT typically occurs at the site of the primary tumor, suggesting that dose to the site of the dominant lesion is an important predictor of future failure. The efficacy and safety of tumor-focused RT compared to that of standard RT for definitive treatment of localized PC has not been assessed. RadTARGET (RAdiation Dose TAiloRing Guided by Enhanced Targeting) is a phase II randomized trial that aims to demonstrate superior safety of image-guided, tumor-focused RT compared to standard RT for acute genitourinary (GU) or gastrointestinal (GI) in the setting of definitive RT for intermediate- and high-risk PC. The study intervention is image-guided, tumor-focused RT with dose intensification of cancer visible on imaging and dose de-intensification to remaining prostate. Patients will be randomized to two arms: those who receive standard RT dose and those that receive tumor-focused RT. The study population will be patients with intermediate- or high-risk PC planning to undergo definitive RT with or without systemic therapy. The primary endpoint to compare between randomized arms is acute GU or GI grade [≥]2 adverse events. Participant and study duration are 5 years and 8 years, respectively. RadTARGET will compare the efficacy and safety of tumor-focused RT to that of standard RT for definitive treatment of localized PC. We hypothesize that the tumor-focused approach will substantially reduce adverse events after prostate RT while retaining high efficacy. If this hypothesis is confirmed, we will conclude that a phase III randomized control trial is warranted to formally establish oncologic non-inferiority compared to the current standard of whole-gland dose escalation.